Learning Rules for Layout Analysis Correction

نویسندگان

  • Donato Malerba
  • Floriana Esposito
  • Oronzo Altamura
چکیده

Layout analysis is the process of extracting a hierarchical structure describing the layout of a page. In the document processing system WISDOM++ the layout analysis is performed in two steps: first, the global analysis determines possible areas containing paragraphs, sections, columns, figures and tables, and second, the local analysis groups together blocks that possibly fall within the same area. The result of the local analysis process strongly depends on the quality of the results of the first step. In this paper we investigate the possibility of supporting the user during the correction of the results of the global analysis. This is done by allowing the user to correct the results of the global analysis and then by learning rules for layout correction from the user sequence of actions. Preliminary experimental results on a set of multi-page documents are reported. 1. Background and motivations Strategies for the extraction of layout analysis have been traditionally classified as top-down or bottom-up [6]. In top-down methods the document image is repeatedly decomposed into smaller and smaller components, while in bottom-up methods basic layout components are extracted from bitmaps and then grouped together into larger blocks on the basis of their characteristics. In WISDOM++ (www.di.uniba.it/ malerba/wisdom++/), a document image analysis system that can transform paper documents into either HTML or XML format [1], the applied page decomposition method is hybrid, since it combines a variant of the RLSA [5] to segment the document image and a bottom-up layout analysis method to assemble basic blocks into larger components called frames. The layout analysis is done in two steps: 1. A global analysis of the document image in order to determine possible areas containing paragraphs, sections, columns, figures and tables. This step is based on an iterative process, in which the vertical and horizontal histograms of text blocks are alternately analyzed, in order to detect columns and sections/paragraphs, respectively. 2. A local analysis of the document to group together blocks that possibly fall within the same area. Three perceptual criteria are considered in this step: proximity (e.g. adjacent components belonging to the same column/area are equally spaced), continuity (e.g. overlapping components) and similarity (e.g. components of the same type, with an almost equal height). Experimental results proved the effectiveness of the approach on images of the first page of papers published in either conference proceedings or journals. However, performance degenerates when the system is tested on intermediate pages of multi-page articles, where the structure is much more variable, due to the presence of formulae, images, and drawings that can stretch over more than one column, or are quite close. The main source of the errors made by the layout analysis module is in the global analysis step, while the local analysis step performs satisfactorily when the result of the global analysis is correct. In this paper, we investigate the possibility of supporting the user during the correction of the results of the global analysis. This is done by means of two new system facilities: 1. the user can correct the results of the layout analysis by either grouping or splitting columns/sections, automatically produced by the global analysis; 2. the user can ask the system to learn grouping/splitting rules from his/her sequence of actions, which correct the results of the layout analysis. In the following section, a description of the layout correction operations is reported, and the automated generation of training examples is explained. Section 3 briefly introduces the learning system used to generate layout correction rules, while some preliminary experimental results are reported in Section 4. Figure 1. Tree structure of the columns and sections determined by WISDOM++. 2. Correcting the results of the global analysis Global analysis aims at determining the general layout structure of a page and operates on a tree-based representation of nested columns and sections (see Figure 1). The levels of columns and sections are alternated, which means that a column contains sections, while a section contains columns. At the end of the global analysis process, the user can see only those sections and those columns that have been considered atomic, that is, not subject to further decomposition (see Figure 2). The user can correct this result by means of three different operations: Horizontal splitting: a column/section is cut horizontally. Vertical splitting: a column/section is cut vertically. Grouping: two sections/columns are merged together. The cut point in the two splitting operations is automatically determined by computing either the horizontal or the vertical histogram on the basic blocks output by the segmentation algorithm. The horizontal (vertical) cut point corresponds to the largest gap between two consecutive bins in the horizontal (vertical) histogram. Therefore, splitting operations can be described by means of a binary function, namely, split(X,S), where X represents the column/section to be split, S is an ordinal number representing the step of the correction process and the range of the split function is the set fhorizontal, vertical, no-splitg. The grouping operation, which can be described by the ternary predicate group(A,B,S), is applicable to two sections (columns) A and B and returns a new section (column) C, whose boundary is determined as follows. Let (leftX, topX) and (rightX, bottomX) be the coordinates of the top-left and bottom-right vertices of a column/section X, respectively. Then: leftC= min(leftA, leftB), rightC=max(rightA,rightB), topC=min(topA,topB), bottomC=max(bottomA,bottomB). Figure 2. Results of the global analysis process: one column (above) includes two sections (below). The result of the global analysis process is in the background. Grouping is possible only if the following two conditions are satisfied: 1. C does not overlap another section (column) in the document. 2. A and B are nested in the same column (section). Immediately after each splitting/grouping operation, WISDOM++ recomputes the result of the local analysis process, so that the user can immediately perceive the final effect of the requested corrections and can decide whether to confirm correction or reject it. ¿From user interaction, WISDOM++ implicitly generates some training observations describing when and how the user intended to correct the result of the global analysis. These training observations are used to learn correction rules of the result of the global analysis, as explained below. 3 Learning rules for layout correction Rules for the automated correction of the layout analysis can be automatically learned by means of the system ATRE [3]. The learning problem solved by ATRE can be briefly formulated as follows: Given a set of concepts C1; C2; ::; Cr to be learned, a set of observations O described in a language LO, a background knowledge BK defined in a language LBK , a language of hypotheses LH , a user’s preference criterion PC, Find a (possibly recursive) logical theory T for the concepts C1; C2; ; Cr, such that T is complete and consistent with respect to O and satisfies the preference criterion PC. In the context of global analysis correction, the set of concepts to be learned are split(X,S)=horizontal, split(X,S)=vertical, group(X,Y,S)=true, since we are interested in finding rules which predict when to split horizontally/vertically or when to group two columns/sections. Therefore, no rule is generated for the case split(X,S)=nosplit and group(X,Y,S)=false. The language of observations LO defines a suitable representation of the global layout structure. In this work, we restrict this representation to the lowest column and section levels in the tree structure extracted by the global analysis (see Figure 1) and we deliberately ignore other levels as well as their composition hierarchy. Nevertheless, describing this portion of the layout structure is not straightforward, since the columns and sections are spatially related and the feature-vector representation typically adopted in statistical approaches cannot render these relations. In this work, a first-order logic language is used to describe both documents and rules. In this language, unary function symbols, called attributes, are used to describe properties of a single layout component (e.g., height and width), while binary predicate and function symbols, called relations, are used to express spatial relationships among layout components (e.g., part of and on top). In ATRE observations are represented by means of ground multiple-head clauses, called objects, which have a conjunction of literals in the head. An example of an observation automatically generated by WISDOM++ follows: split(c1,s)=horizontal, group(s1,s2,s)=false, split(s1,s)=nosplit, split(s2,s)=nosplit step(s)=1, type(s1)=section, type(s2)=section, type(c1)=column, width(s1)=552, width(s2)=552, width(c1)=552, height(s1)=8, height(s2)=723, height(c1)=852, x pos centre(s1)=296, x pos centre(s2)=296, x pos centre(c1)=296, y pos centre(s1)=22, y pos centre(s2)=409, y pos centre(c1)=426, on top(s1,s2)=true, part of(c1,s1)=true, part of(c1,s2)=true, no blocks(s1)=2, no blocks(s2)=108, no blocks(c1)=110, per text(s1)=100, per text(s2)=83, per text(c1)=84. This observation describes the first correction applied to a page layout, where two sections and one column were originally found (Figure 2). The horizontal splitting of the column is the first correction performed by the user, as described by the first literal, namely step(s)=1. This colTable 1. Distribution of pages and examples per document. Name of pages horiz. vert. group Total documents splits splits ex.s

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Logic Programs for Layout Analysis Correction

Layout analysis is the process of extracting a hierarchical structure describing the layout of a page. In the system WISDOM++, the layout analysis is performed in two steps: firstly, the global analysis determines possible areas containing paragraphs, sections, columns, figures and tables, and secondly, the local analysis groups together blocks that possibly fall within the same area. The resul...

متن کامل

Adaptive Layout Analysis of Document Images

Layout analysis is the process of extracting a hierarchical structure describing the layout of a page. In the document processing system WISDOM++ the layout analysis is performed in two steps: firstly, the global analysis determines possible areas containing paragraphs, sections, columns, figures and tables, and secondly, the local analysis groups together blocks that possibly fall within the s...

متن کامل

Correcting the Document Layout: A Machine Learning Approach

In this paper, a machine learning approach to support the user during the correction of the layout analysis is proposed. Layout analysis is the process of extracting a hierarchical structure describing the layout of a page. In our approach, the layout analysis is performed in two steps: firstly, the global analysis determines possible areas containing paragraphs, sections, columns, figures and ...

متن کامل

Advanced Physical Models for Mask Data Verification and Impacts on Physical Layout Synthesis

The proliferation and acceptance of reticle enhancement technologies (RET) like optical proximity correction (OPC) and phase shift masking (PSM) have significantly increased the cost and complexity of sub-100 nm photomasks. The photomask layout is no longer an exact replica of the design layout. As a result, reliably verifying RET synthesis accuracy, structural integrity, and conformance to mas...

متن کامل

Explain the theoretical and practical model of automatic facade design intelligence in the process of implementing the rules and regulations of facade design and drawing

Artificial intelligence has been trying for decades to create systems with human capabilities, including human-like learning; Therefore, the purpose of this study is to discover how to use this field in the process of learning facade design, specifically learning the rules and standards and national regulations related to the design of facades of residential buildings by machine with a machine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001